Skip to main content

Module 02 - Microservices with Python

The Monolith That Broke on a Tuesday

It starts innocently. Your document processing application handles uploads, runs OCR, classifies documents, sends notifications, and stores results - all in one Flask app. Deploys take 40 minutes. A bug in the notification code brings down OCR. A memory leak in the classification model crashes the upload handler. You cannot scale the CPU-heavy processing tier independently of the lightweight notification sender.

Then Tuesday comes. A spike in document uploads saturates the OCR workers. The entire application becomes unresponsive. Users cannot even check their upload history - that endpoint lives in the same overloaded process.

This module is about what you build instead, and more importantly, how and when to build it.

What You Will Learn

LessonTopicKey Skills Gained
01FastAPI in DepthDI patterns, lifespan events, middleware, exception handlers, OpenAPI customisation
02gRPC with PythonProtocol Buffers, all four streaming patterns, interceptors, error mapping
03Event-Driven ArchitectureKafka, Redis Streams, Event Sourcing, CQRS, Saga pattern
04Service Mesh PatternsCircuit breakers, retry with jitter, bulkheads, OpenTelemetry tracing
05API Versioning and ContractsPact contract testing, schema evolution, SDK generation, deprecation

Prerequisites: Python async fundamentals (Module 1 of this series), Docker basics, HTTP fundamentals.

Time commitment: ~12 hours of focused study, ~8 hours of hands-on project work.

The Migration: From Monolith to Four Services

The best way to understand microservice boundaries is to watch a real extraction. Here is DocumentProcessingMonolith - a class doing eight things that should never be owned by a single deployable unit.

# BEFORE: The Monolith - one class, eight responsibilities
# Every deployment touches every capability.
# Every bug can affect every user.
# You cannot scale OCR independently of email sending.

class DocumentProcessingMonolith:
def __init__(self, db_conn, email_client, storage_client, ocr_engine, classifier):
self.db = db_conn
self.email = email_client
self.storage = storage_client
self.ocr = ocr_engine
self.classifier = classifier

def process_document(self, file_bytes: bytes, user_id: str, filename: str) -> dict:
# Responsibility 1: Validate input
if len(file_bytes) > 50 * 1024 * 1024:
raise ValueError("File too large")
if not filename.endswith((".pdf", ".png", ".jpg")):
raise ValueError("Unsupported format")

# Responsibility 2: Store raw file
storage_key = f"raw/{user_id}/{filename}"
self.storage.put(storage_key, file_bytes)

# Responsibility 3: Run OCR (CPU-heavy, 2–30 seconds)
text = self.ocr.extract_text(file_bytes)

# Responsibility 4: Extract metadata
metadata = self._extract_metadata(file_bytes)

# Responsibility 5: Generate thumbnail (also CPU-heavy)
thumbnail = self._generate_thumbnail(file_bytes)
self.storage.put(f"thumbs/{user_id}/{filename}.jpg", thumbnail)

# Responsibility 6: Classify document (ML inference)
label = self.classifier.classify(text)

# Responsibility 7: Write audit log
self.db.execute(
"INSERT INTO audit_log(user_id, filename, label, ts) VALUES (%s, %s, %s, %s)",
(user_id, filename, label, datetime.utcnow()),
)

# Responsibility 8: Send notification email
email = self.db.fetchone(
"SELECT email FROM users WHERE id = %s", (user_id,)
)["email"]
self.email.send(
to=email,
subject="Your document is ready",
body=f"Document '{filename}' classified as: {label}",
)

return {"storage_key": storage_key, "label": label, "text": text[:500]}

The problems are architectural, not stylistic. You cannot:

  • Scale OCR (4 CPUs minimum) without also scaling email sending (almost zero CPU).
  • Deploy a new ML classification model without touching the upload handler.
  • Upgrade notification templates without running the full OCR regression suite.
  • Have the ML team own the classifier independently of the infra team owning storage.
  • Isolate a memory leak in the classifier from affecting uploads.

AFTER: Four Services with Clear Boundaries

┌──────────────────────────────────────────────────────────────────────┐
│ API Gateway / Nginx │
│ (TLS termination, auth, rate limiting, routing) │
└──────────────────┬────────────────────────────────────┬──────────────┘
│ REST/JSON │ REST/JSON
▼ ▼
┌──────────────────────────┐ ┌────────────────────────────┐
│ Upload Service │ │ Classification Service │
│ FastAPI · Python │ │ FastAPI + gRPC · Python │
│ │ │ │
│ • File validation │ │ • ML model serving │
│ • Virus scanning │ │ • Text → label mapping │
│ • S3/GCS storage │ │ • Confidence scores │
│ • Publishes: │ │ • Model version tracking │
│ doc.uploaded event │ │ • gRPC for internal calls │
│ │ │ │
│ CPU: low RAM: 256 MB │ │ CPU: high RAM: 2 GB │
│ Replicas: 2 │ │ Replicas: 4 │
└──────────────┬───────────┘ └────────────────┬───────────┘
│ │
│ ┌────────────────────────┐ │
│ │ Message Broker │ │
└────►│ Kafka │◄─────┘
│ │
│ Topics: │
│ • doc.uploaded │
│ • doc.processed │
│ • doc.classified │
└────────────┬────────────┘

┌──────────────────────┴─────────────────────┐
▼ ▼
┌────────────────────────────┐ ┌─────────────────────────────┐
│ Processing Service │ │ Notification Service │
│ FastAPI · Python │ │ FastAPI · Python │
│ │ │ │
│ • OCR (pytesseract) │ │ • Email (SendGrid) │
│ • PDF text extraction │ │ • SMS (Twilio) │
│ • Thumbnail generation │ │ • In-app push notifications │
│ • Metadata extraction │ │ • User preference lookup │
│ • Publishes: │ │ • Template rendering │
│ doc.processed event │ │ │
│ │ │ CPU: low RAM: 128 MB │
│ CPU: very high RAM: 1 GB │ │ Replicas: 1 │
│ Replicas: 8 │ │ │
└────────────────────────────┘ └─────────────────────────────┘

Each service:

  • Owns its own data store - no shared database
  • Deploys independently on its own CI/CD pipeline
  • Scales based on its own resource bottleneck
  • Is owned by one team with full autonomy
  • Can be rewritten or replaced without affecting other services

This is the architecture you will build, piece by piece, across this module.

When to Use Microservices vs Monolith

This is the most consequential decision in this module. Most teams reach for microservices too early, creating distributed monoliths - all the operational complexity of distributed systems, with none of the independent deployment or scaling benefits.

The Decision Matrix

FactorChoose MonolithChoose Microservices
Team sizeUnder 8 engineersMultiple teams, 15+ engineers
Deployment frequencyInfrequent, coordinatedTeams deploy independently, 20+ times/day
Scale requirementsRoughly uniformWildly different per component
Domain clarityStill discovering the modelWell-understood bounded contexts
Operational maturityNo Kubernetes expertiseStrong DevOps culture, observability in place
Data ownershipShared DB acceptableClear ownership, teams own their schema
Development speedShip an MVP fastIndependent team velocity matters

The rule: If your team cannot draw bounded contexts on a whiteboard without a 30-minute argument, you are not ready for microservices.

Start with a modular monolith - well-separated modules inside one deployable, with strict interface boundaries. Extract to services when you have evidence of the need.

# A modular monolith: one deployment, clean internal interfaces
# When you later extract to a service, you only change the adapter

# upload/ports.py - defines what upload module needs from outside
from abc import ABC, abstractmethod

class StoragePort(ABC):
@abstractmethod
async def put(self, key: str, data: bytes) -> str: ...

class EventPort(ABC):
@abstractmethod
async def publish(self, topic: str, event: dict) -> None: ...

# upload/service.py - business logic, no infrastructure knowledge
class UploadService:
def __init__(self, storage: StoragePort, events: EventPort):
self._storage = storage
self._events = events

async def upload(self, file_bytes: bytes, filename: str, user_id: str) -> str:
key = f"raw/{user_id}/{filename}"
await self._storage.put(key, file_bytes)
await self._events.publish("doc.uploaded", {
"key": key, "user_id": user_id, "filename": filename
})
return key

# In a monolith: events are in-process function calls
# When you extract: events become Kafka messages
# The UploadService code does not change - only the EventPort adapter changes

The EventPort abstraction is the key. Today it calls a function in-process. Tomorrow it sends a Kafka message. The service logic is identical.

CAP Theorem: The Constraint Every Microservice Engineer Must Internalize

In a distributed system, you can guarantee at most two of these three properties simultaneously:

PropertyMeaningReal-World Definition
ConsistencyEvery read reflects the latest writeAll nodes return the same data at the same moment
AvailabilityEvery request receives a responseSystem responds even when some nodes fail
Partition ToleranceSystem operates despite network failuresContinues working when the network splits nodes

Network partitions are inevitable in distributed systems. You will have network failures. So partition tolerance is not optional - the practical choice is always between C and A when a partition occurs:

  • CP systems (Consistency + Partition Tolerance): Refuse to answer during a partition rather than return stale data. Examples: HBase, Zookeeper, etcd, PostgreSQL in strict mode. Use for: distributed locks, financial transactions, leader election.
  • AP systems (Availability + Partition Tolerance): Return potentially stale data rather than refuse to respond. Examples: Cassandra, DynamoDB, CouchDB. Use for: shopping carts, user sessions, activity feeds, search indexes.

In the document intelligence platform:

DataChoiceReason
Classification label in read modelAPSlight staleness is fine; user sees updated label within seconds
Billing record for processing chargeCPMust be accurate; better to show an error than charge incorrectly
Audit log entriesAP with eventual consistencyEvents arrive in order eventually; availability > immediate consistency
User session tokenAPReturning a slightly stale token is better than login failing

The 8 Fallacies of Distributed Computing

Peter Deutsch and James Gosling catalogued eight assumptions developers incorrectly make. Each one will cause production incidents if ignored.

#FallacyThe TruthHow to Defend Against It
1The network is reliablePackets are dropped, connections reset, routers failRetries with exponential backoff; idempotent operations
2Latency is zeroCross-datacenter calls: 5–100 ms; cross-pod: 0.5–2 msAsync messaging; connection pooling; caching; batching
3Bandwidth is infiniteLarge payloads are expensive and slowPagination; compression (gzip/brotli); streaming; binary protocols
4The network is secureTraffic can be intercepted, replayed, spoofedmTLS between services; service-to-service JWT auth; encryption at rest
5Topology doesn't changeIPs change when pods restart; services autoscale in and outDNS-based service discovery; health checks; graceful connection draining
6There is one administratorMultiple teams deploy conflicting changes simultaneouslyAPI contracts; consumer-driven contract testing; feature flags
7Transport cost is zeroSerialisation, TLS handshakes, HTTP overhead all accumulategRPC (binary, multiplexed) for high-frequency internal calls
8The network is homogeneousDifferent languages, OS, protocol versions, MTU sizesStandard protocols (HTTP/2, gRPC); schema registries; protocol buffers

By the end of this module, you will have written Python code that defends against all eight.

How Python Fits Into Polyglot Microservice Architectures

Python is rarely the only language in a mature microservice shop. Understanding how Python services interoperate with Go, Java, and Rust services is essential.

Polyglot Production Architecture
─────────────────────────────────

┌──────────────────┐ gRPC (proto) ┌──────────────────────────┐
│ API Gateway │─────────────────►│ Auth Service (Go) │
│ Python / FastAPI│ │ ~1 ms latency, 64 MB RAM│
└────────┬─────────┘ └──────────────────────────┘

│ REST / JSON

┌──────────────────┐ Kafka events ┌──────────────────────────┐
│ Document Service │─────────────────►│ ML Pipeline (Python) │
│ Python / FastAPI │ │ PyTorch inference server │
└──────────────────┘ └──────────────────────────┘

│ REST / JSON

┌──────────────────┐ gRPC (proto) ┌──────────────────────────┐
│ Search Service │─────────────────►│ Index Builder (Java) │
│ Python / FastAPI │ │ Lucene, needs heavy JVM │
└──────────────────┘ └──────────────────────────┘

Python wins in microservice architectures for:

  • ML and data pipelines (PyTorch, scikit-learn, pandas have no peer)
  • API gateway services (FastAPI with uvicorn rivals Go throughput for I/O-bound work)
  • Scripting and orchestration (calling and coordinating other services)
  • Rapid prototyping (fastest path from idea to deployed service)

Python loses to Go/Rust when:

  • CPU-intensive hot paths need true parallelism (the GIL is a real constraint)
  • Memory is severely constrained (Python interpreter overhead is ~30–50 MB baseline)
  • Connection counts exceed ~10,000 concurrent (Go goroutines outperform asyncio at this scale)

The practical pattern: Python for business logic and ML inference; Go or Rust for the network-critical, CPU-intensive hot paths; all connected via gRPC and Kafka.

Module Project: The Document Intelligence Platform

Each lesson constructs one piece of a complete, deployable Document Intelligence Platform.

LessonService BuiltTechnology Highlighted
01 - FastAPI in DepthUpload ServiceDI, lifespan, middleware, background tasks
02 - gRPC with PythonClassification ServiceProto definition, streaming RPC, interceptors
03 - Event-Driven ArchitectureEvent backboneKafka topics, event sourcing, saga pattern
04 - Service Mesh PatternsResilience layerCircuit breakers, tracing, health checks
05 - API Versioning and ContractsVersioned APIPact tests, schema evolution, client SDK

The code in each lesson is production-quality. It handles errors, logs correctly, and is structured for testability. You can deploy it.

Environment Setup

# Project structure
mkdir -p doc-intelligence/{upload-service,classification-service,processing-service,notification-service,shared,protos}
cd doc-intelligence

# Python environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate

# Core service dependencies
pip install fastapi "uvicorn[standard]" httpx pydantic "pydantic-settings"
pip install grpcio grpcio-tools protobuf
pip install kafka-python confluent-kafka redis
pip install opentelemetry-api opentelemetry-sdk
pip install opentelemetry-instrumentation-fastapi
pip install opentelemetry-exporter-otlp
pip install tenacity pact-python

# Infrastructure via Docker Compose
cat > docker-compose.yml << 'YAML'
version: "3.9"

services:
zookeeper:
image: confluentinc/cp-zookeeper:7.6.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181

kafka:
image: confluentinc/cp-kafka:7.6.0
depends_on: [zookeeper]
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"

redis:
image: redis:7-alpine
ports:
- "6379:6379"

postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: documents
POSTGRES_USER: admin
POSTGRES_PASSWORD: secret
ports:
- "5432:5432"

jaeger:
image: jaegertracing/all-in-one:1.55
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
YAML

docker compose up -d
echo "Infrastructure ready."

Each lesson is self-contained - you can study gRPC without having read the FastAPI lesson. But the project connects them all. Recommended approach:

  1. Read the lesson once, end to end, skimming code to understand structure
  2. Reproduce every code example from scratch (not copy-paste) - this is where understanding forms
  3. Build the mini-project at the end of each lesson
  4. Integrate your service with the one built in the previous lesson

The fastest path to mastering distributed systems is building a small one and watching it fail in interesting ways.

Let's begin.

© 2026 EngineersOfAI. All rights reserved.